摘要 :
Despite significant recent progress on generative models, controlled generation of images depicting multiple and complex object layouts is still a difficult problem. Among the core challenges are the diversity of appearance a give...
展开
Despite significant recent progress on generative models, controlled generation of images depicting multiple and complex object layouts is still a difficult problem. Among the core challenges are the diversity of appearance a given object may possess and, as a result, exponential set of images consistent with a specified layout. To address these challenges, we propose a novel approach for layout-based image generation; we call it Layout2Im. Given the coarse spatial layout (bounding boxes + object categories), our model can generate a set of realistic images which have the correct objects in the desired locations. The representation of each object is disentangled into a specified/certain part (category) and an unspecified/uncertain part (appearance). The category is encoded using a word embedding and the appearance is distilled into a low-dimensional vector sampled from a normal distribution. Individual object representations are composed together using convolutional LSTM, to obtain an encoding of the complete layout, and then decoded to an image. Several loss terms are introduced to encourage accurate and diverse image generation. The proposed Layout2Im model significantly outperforms the previous state-of-the-art, boosting the best reported inception score by 24.66% and 28.57% on the very challenging COCO-Stuff and Visual Genome datasets, respectively. Extensive experiments also demonstrate our model's ability to generate complex and diverse images with many objects.
收起
摘要 :
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different ...
展开
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
收起
摘要 :
Due to data scarcity and class imbalance in medical images, the training dataset seriously affects the classification accuracy of the model. We propose a retinal image generation model based on GAN (RetiGAN). A dual-scale discrimi...
展开
Due to data scarcity and class imbalance in medical images, the training dataset seriously affects the classification accuracy of the model. We propose a retinal image generation model based on GAN (RetiGAN). A dual-scale discriminator is operated to train the network at two scales to improve the quality of generated images. RetiGAN can better retain the semantic information of the original images under the guidance of the content loss due to the embedding of the VGG network into RetiGAN to extract the high-level semantic information of the original and the generated images. Besides, in order to enhance the details of the generated image, RetiGAN is guided to generate the retinal images with clearer edges by feeding smoothed images to the discriminator and forcing it to distinguish the smoothed from the original ones. The qualitative and quantitative analysis verifies that the generated retinal images are similar to the original ones in structure rather than simple copies. In addition, ablation experiments exhibit that the model can improve the resolution of generated images with better visibility and clearer edges. In summary, RetiGAN is superior to other retinal image generation models in the aspects of the preservation of structural similarity and high resolution.
收起
摘要 :
Generative Adversarial Networks (GANs) are a type of deep learning architecture that uses two networks namely a generator and a discriminator that, by competing against each other, pursue to create realistic but previously unseen ...
展开
Generative Adversarial Networks (GANs) are a type of deep learning architecture that uses two networks namely a generator and a discriminator that, by competing against each other, pursue to create realistic but previously unseen samples. They have become a popular research topic in recent years, particularly for image processing and synthesis, leading to many advances and applications in various fields. With the profusion of published works and interest from professionals of different areas, surveys on GANs are necessary, mainly for those who aim starting on this topic. In this work, we cover the basics and notable architectures of GANs, focusing on their applications in image generation. We also discuss how the challenges to be addressed in GANs architectures have been faced, such as mode coverage, stability, convergence, and evaluating image quality using metrics.& COPY; 2023 Elsevier Ltd. All rights reserved.
收起
摘要 :
Semantic image synthesis methods learn to generate new images conditioned on predefined semantic label maps. Existing methods require access to large-volume samples labeled with semantic maps, which limits their applications. We p...
展开
Semantic image synthesis methods learn to generate new images conditioned on predefined semantic label maps. Existing methods require access to large-volume samples labeled with semantic maps, which limits their applications. We propose USIS, a Unified Semantic Image Synthesis model which can be trained on only a single or multiple pairs of images and semantic maps. Once trained, a USIS model can generate new images according to unseen semantic maps, as existing semantic image synthesis methods do. Specifically, we design a hierarchical architecture to reconstruct training samples and grad-ually learn the distributions of multi-scale patches in samples from coarse to fine. To avoid the error accu-mulation across scales, we propose a mixed training strategy to stabilize the training process. Extensive experiments on one-or multiple-sample datasets show our proposed model achieves state-of-the-art performance in terms of visual fidelity.(c) 2022 Elsevier B.V. All rights reserved.
收起
摘要 :
In this paper, the set-convexity and mapping-convexity properties of the extended images of generalized systems are considered. By using these image properties and tools of topological linear spaces, separation schemes ensuring th...
展开
In this paper, the set-convexity and mapping-convexity properties of the extended images of generalized systems are considered. By using these image properties and tools of topological linear spaces, separation schemes ensuring the impossibility of generalized systems are developed. Then, special problem classes are investigated. [References: 31]
收起
摘要 :
Synthesizing images from descriptive text is an exciting and challenging task in multimodal deep learning, which has broad prospects of application in the fields of visual reasoning, image editing, style migrating, and so on. This...
展开
Synthesizing images from descriptive text is an exciting and challenging task in multimodal deep learning, which has broad prospects of application in the fields of visual reasoning, image editing, style migrating, and so on. This paper proposes SWF-GAN to solve such problems: the limited constraint of coarse-grained information leads to difficulties in building semantic mappings of text-to-image accurately and ordinary mask predictors do not have enough representational capacity to accurately perceive the global information of images. SWF-GAN designs a sentence-word fusion perceptual module which divides the semantic perception of the generative model into two major layers, sentence and word, building affine transformations to constrain image synthesis using the coarse-grained feature on the sentence level and specific image synthesis using the fine-grained feature on the word level. Additionally, a weakly supervised coordinate mask predictor is employed in the sentence layer, extracting long-range dependencies with precise positional information vertically and horizontally to assign more information to the subject in the complex image background thus accurately generating the structure of the target object. The experiments show that the sentence-word fusion perceptual generative adversarial network model proposed in this paper can generate clearer and more lively images without a heavy computational burden. Compared with the baseline model, the proposed model improves the IS and FID scores by 0.97% and 22.95% respectively, and the experimental results on different datasets and the ablation study results show the effectiveness of our model.& COPY; 2023 Elsevier Ltd. All rights reserved.
收起
摘要 :
Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to...
展开
Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to-end learning framework to generate diverse, realistic and controllable face images guided by face masks. The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face, such as eyes, nose and mouse. The framework consists of four components: style encoder, style decoder, generator and discriminator. The style encoder generates a style code which represents the style of the result face; the generator translate the input face mask into a real face based on the style code; the style decoder learns to reconstruct the style code from the generated face image; and the discriminator classifies an input face image as real or fake. With the style code, the proposed model can generate different face images matching the input face mask, and by manipulating the face mask, we can finely control the generated face image. We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.
收起
摘要 :
In recent years, generative adversarial networks (GANs) have gained tremendous popularity for various imaging related tasks such as artificial image generation to support AI training. GANs are especially useful for medical imaging...
展开
In recent years, generative adversarial networks (GANs) have gained tremendous popularity for various imaging related tasks such as artificial image generation to support AI training. GANs are especially useful for medical imaging-related tasks where training datasets are usually limited in size and heavily imbalanced against the diseased class. We present a systematic review, following the PRISMA guidelines, of recent GAN architectures used for medical image analysis to help the readers in making an informed decision before employing GANs in developing medical image classification and segmentation models. We have extracted 54 papers that highlight the capabilities and application of GANs in medical imaging from January 2015 to August 2020 and inclusion criteria for meta-analysis. Our results show four main architectures of GAN that are used for segmentation or classification in medical imaging. We provide a comprehensive overview of recent trends in the application of GANs in clinical diagnosis through medical image segmentation and classification and ultimately share experiences for task-based GAN implementations.
收起
摘要 :
We experimentally demonstrate a motion picture imaging technique that can record a magnified image of light pulse propagation with extending the recordable time of digital light-in-flight recording by holography. We constructed an...
展开
We experimentally demonstrate a motion picture imaging technique that can record a magnified image of light pulse propagation with extending the recordable time of digital light-in-flight recording by holography. We constructed an optical system that achieves a recordable time extension and an observation of a magnified image of light pulse propagation. As a result, we experimentally succeeded in recording light pulse propagation with a 7.45 magnification rate with extending the recordable time. The recordable time of the motion picture was 714 fs, which is twice that of the conventional one. (C) 2021 Optica Publishing Group
收起